Robust Character Labeling in Movie Videos: Data Resources and Self-Supervised Feature Adaptation
نویسندگان
چکیده
Robust face clustering is a vital step in enabling computational understanding of visual character portrayal media. Face for long-form content challenging because variations appearance and lack supporting large-scale labeled data. Our work this paper focuses on two key aspects problem: the domain-specific training or benchmark datasets, adapting embeddings learned web images to content, specifically movies. First, we present dataset over 169,000 tracks curated from 240 Hollywood movies with weak labels whether pair belong same different character. We propose an offline algorithm based nearest-neighbor search embedding space mine hard-examples these tracks. then investigate triplet-loss multiview correlation-based methods hard-examples. experimental results highlight usefulness weakly data feature adaptation. Overall, find that adaptation yields more discriminative robust embeddings. Its performance downstream verification tasks comparable state-of-the-art domain. also SAIL-Movie Character Benchmark corpus developed augment existing benchmarks. It consists racially diverse actors provides face-quality subsequent error analysis. hope datasets can further advance automatic labeling videos. All resources are available freely at https://sail.usc.edu/~ccmi/multiface.
منابع مشابه
Weakly Supervised Action Labeling in Videos under Ordering Constraints
We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk” then “sit” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with or...
متن کاملReblur2Deblur: Deblurring Videos via Self-Supervised Learning
Motion blur is a fundamental problem in computer vision as it impacts image quality and hinders inference. Traditional deblurring algorithms leverage the physics of the image formation model and use hand-crafted priors: they usually produce results that better reflect the underlying scene, but present artifacts. Recent learning-based methods implicitly extract the distribution of natural images...
متن کاملRobust ASR model adaptation by feature-based statistical data mapping
Automatic speech recognition (ASR) model adaptation is important to many real-life ASR applications due to the variability of speech. The differences of speaker, bandwidth, context, channel and et al. between speech databases of initial ASR models and application data can be major obstacles to the effectiveness of ASR models. ASR models, therefore, need to be adapted to the application environm...
متن کاملFeature Generation for Robust Semantic Role Labeling
Hand-engineered feature sets are a well understood method for creating robust NLP models, but they require a lot of expertise and effort to create. In this work we describe how to automatically generate rich feature sets from simple units called featlets, requiring less engineering. Using information gain to guide the generation process, we train models which rival the state of the art on two s...
متن کاملFeature-based Encoding and Querying Language Resources with Character Semantics
In this paper we discuss the explicit representation of character features pertaining to written language resources, which we argue are critically necessary in the long term of archiving language data. Much focus on the creation of language resources and their associated preservation is at the level of the corpus itself; however it is generally accepted that long term interpretation of these la...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Multimedia
سال: 2022
ISSN: ['1520-9210', '1941-0077']
DOI: https://doi.org/10.1109/tmm.2021.3096155